Goto

Collaborating Authors

 Educational Setting


Lopez: As Compton students ace tests, educators are baffled by Rep. Maxine Waters' snub of school bond

Los Angeles Times

Things to Do in L.A. Tap to enable a layout that focuses on the article. As Compton students ace tests, educators are baffled by Rep. Maxine Waters' snub of school bond Students walk on campus at Dominguez High School in Compton. A bond measure would provide millions of dollars to rebuild the school. This is read by an automated voice. Please report any issues or inconsistencies here .


Optimal Gap-Dependent Regret for Private Stochastic Decision-Theoretic Online Learning

arXiv.org Machine Learning

We study stochastic decision-theoretic online learning with full information and event-level pure differential privacy. A COLT open problem of Hu and Mehta asks to determine the optimal gap-dependent regret rate for stochastic decision-theoretic online learning under pure event-level differential privacy. For $K$ actions, losses in $[0,1]$, and a unique best action separated from the second-best action by gap $ฮ”_{\min}$, the known lower bound is of order $ \frac{\log K}{\min\{ฮ”_{\min},\varepsilon\}}, $ or equivalently, up to universal constants, of order \[ \frac{\log K}{ฮ”_{\min}}+\frac{\log K}{\varepsilon}. \] We give a horizon-free pure-DP algorithm and prove the explicit regret bound \[ \operatorname{Reg}_T \le 1000 \cdot \left(\frac{\log K}{ฮ”_{\min}}+\frac{\log K}{\varepsilon}\right) \] for every horizon $T$. The numerical constant is not optimized. The algorithm partitions time into blocks of exponentially increasing size, plays a single action throughout each block, and chooses the next action by an exponential mechanism applied to a data-independent random prefix of the previous block. The random prefix converts block regret into a sum, over all prefix lengths, of softmax selection errors. A single entropy-potential argument controls all privacy-dominated large-gap actions at cost $\log K/\varepsilon$.


The Sample Complexity of Multiclass and Sparse Contextual Bandits

arXiv.org Machine Learning

We study contextual bandits in the stochastic i.i.d.\ setting, where a learner observes contexts drawn from an unknown distribution, selects actions from a finite set $A$, and aims to identify an approximately optimal policy from a given class based on bandit feedback. Motivated by bandit multiclass classification with zero-one rewards, we focus on the \emph{$s$-sparse} setting in which, for every context, the reward vector has $L_1$-norm at most $s \ll |A|$. Our main result is the design of algorithms that, with high probability, output an $ฮต$-optimal policy compared to policy class $ฮ $ using $\tilde{O} ((s/ฮต^2 + |A|/ฮต)\log |ฮ |/ฮด)$ samples. We extend this bound to general Natarajan classes and complement it with a matching lower bound (up to logarithmic factors), thereby closing a substantial gap left by prior work (Erez et al., 2024, 2025), which incurred an additional $ฮ˜(|A|^9)$ dependence. We obtain these results via two complementary approaches. First, we analyze contextual bandits through the lens of contextual decision making with structured observations, designing an exploration-by-optimization algorithm whose sample complexity is governed by the \emph{decision-estimation coefficient} (DEC; Foster et al., 2021, 2022). We show that, with $s$-sparse rewards, the induced model class admits a sharp DEC bound that scales with $s$ and directly yields the optimal rate. Since this approach is largely information-theoretic and involves solving complex min-max optimization problems, we also develop a second, more specialized algorithmic method based on a low-variance exploration technique. This approach leads to concrete, tractable algorithms and naturally extends to contextual combinatorial semi-bandits, leading to improved sample complexity guarantees for bandit multiclass list classification.


Online Learning on Hidden-Convex Losses via Algorithmic Equivalence: Optimal Regret, Geometric Barrier, and Bandit Feedback

arXiv.org Machine Learning

We study adversarial online learning with hidden-convex losses, i.e., nonconvex losses that become convex after a nonlinear reparameterization. Ghai, Lu and Hazan (2022) proved that, under geometric and smoothness assumptions, online gradient descent (OGD) on such nonconvex losses approximately simulates online mirror descent (OMD) on the underlying convex losses with a suitable regularizer, yielding $\mathcal{O}(T^{2/3})$ regret. They left open whether the optimal $ฮ˜(\sqrt{T})$ regret from online convex optimization can be recovered in this hidden-convex setting. We answer this question affirmatively. More specifically, via a sharper discrete-time algorithmic equivalence argument, we prove that OGD achieves $\mathcal{O}(\sqrt{T})$ regret under the same assumptions, matching the optimal worst-case rate for adversarial online convex optimization. We also address another open question of Ghai, Lu and Hazan (2022) by clarifying the geometry required for this algorithmic equivalence. We replace the diagonal-Jacobian sufficient condition with a necessary-and-sufficient Hessian compatibility condition, thereby expanding the class of admissible reparameterizations. We complement our tight regret bound with a lower bound showing that the Hessian compatibility assumption is essential for OGD; when it fails, we construct a smooth reparameterization and an adversarial sequence of hidden-convex losses for which OGD suffers $ฮฉ(T)$ regret. Finally, we extend our analysis to one-point bandit feedback and prove a $\mathcal{O}(T^{3/4})$ expected regret bound for bandit OGD with spherical smoothing, matching its classical rate on convex losses.


BASIS: Batchwise Advantage Estimation from Single-Rollout Information Sharing for LLM Reasoning

arXiv.org Machine Learning

Reinforcement learning with verifiable rewards has become a standard recipe for improving the reasoning abilities of large language models. Existing algorithms face a tradeoff between computational efficiency and sample efficiency in value estimation and policy learning. We introduce BASIS, a critic-free post-training algorithm designed to address this tradeoff. At each online training step, BASIS samples only one rollout per prompt, but leverages rich information across prompts in the entire batch to improve value function estimation. Our experiments demonstrate that BASIS reduces MSE in value function estimation by 69% compared to REINFORCE++, a representative single-rollout baseline, and achieves lower MSE with one rollout than group mean estimators with 8 rollouts. This improvement in value estimation translates to better policy optimization: using substantially less training time, BASIS achieves performance close to multi-rollout GRPO-type baselines and often outperforms single-rollout REINFORCE-type baselines.


An Effective-Rank Audit of Alignment-Induced Activation Shifts: Confound Control, Constructive Calibration, and Limits

arXiv.org Machine Learning

We audit alignment-induced shifts in residual-stream activations of three open-weight instruction-tuned LLMs (Llama-3.1-8B-Instruct, Gemma-2-9B-it, Qwen-2.5-7B-Instruct) using the effective rank of the alignment modification matrix on safety-relevant inputs, rho_eps := rank_eps(M_Ds)/d, which formalizes the single-refusal-direction observation of Arditi et al. (2024) as a continuous quantity. The paper has three contributions. (1) Confound-controlled measurement: a four-variant decomposition (M_naive, M_template, M_aligned, M_DiD) separates chat-template formatting, alignment-stage shift, and the refusal-mediating direction, and recovers the Arditi refusal direction on M_DiD at |cos| in {0.77, 0.86, 0.50} (Llama/Gemma/Qwen); chat-template-controlled rho_eps is {0.0029, 0.0048, 0.0044}, and the centered SVD residual is 4-7x larger. (2) Constructive calibration on a 3-layer MLP across rho_eps in {0.008, 0.17, 0.33, 0.40} exhibits a sweet-spot vs. brittle distinction: mild rank-maximization (lambda=5) buys ablation robustness, while strong regularization at the same nominal rho_eps (lambda=50) does not. rho_eps is a diagnostic for fragility, not a target whose mechanical inflation buys robustness. (3) Limits of rank-based diagnostics: (a) not safety-specific (LRH baseline is 2-3x the safety value); (b) SVD principal ordering does not match causal ordering (Llama u_2 inert despite ranking second; cumulative ablation non-monotone at k=5); (c) the spectral-gap hypothesis required to upgrade the O(rho_eps * d) achievability bound to a matching Mirsky-route lower bound fails empirically (1/90 Llama layer-reference pairs, 0/36 MLP combinations) and structurally (kappa_lb <= 2/(eps * r)). The matching lower bound remains an open problem.


Selena Gomez is reportedly bringing her talents to award-winning director's new four-hour X-rated movie

FOX News

Minka Kelly uncorks a heater at 45, ABS backfires spectacularly and LSU parents vs a security guard! Robot's lifeless corpse hauled off stage after fall during disastrous Michael Jackson impression Bear cubs spar on woman's front porch in adorable viral nature video, reactions pour in Show Tiffany Stratton some respect -- a boob job doesn't mean the WWE champ is made of plastic Britney Spears stuns with a post-plea deal Instagram dance, college baseball HOT mic & is this dream normal? Landlord in a tenant's home for repairs was caught on a security camera getting it on with a woman instead Paige Spiranac continues her generational golf content influencing run in 2026, Mike Alstott is ripped & MEAT! 'World's sexiest fan' drops her World Cup anthem and here's why you never assist a bike thief Wearing only a watch, a headlamp and flip-flops isn't a great disguise when trashing a neighbor's motion light Stephen Miller: The American people rejected'third world' Democratic policies by voting for Trump Former CENTCOM commander'concerned' about Iran's residual military capabilities Wall Street titans sound alarm on Mamdani's'reckless' targeting of top employers Retired general says Iran is fighting a'war of resistance' Kevin Warsh's potential Fed chairmanship sparks economic debate on inflation Minnesota fraud mastermind sentenced to 41.5 years in prison OutKick-Culture Selena Gomez is reportedly bringing her talents to award-winning director's new four-hour X-rated movie Don't let reports that Selena Gomez is going to be starring in an X-rated movie fool you. This isn't going to be a poorly produced amateur-level movie thrown together with someone who doesn't know what they're doing. It's also not a sex tape, for the folks who can't get their act together.


Artificial Intelligence glitch at Arizona college graduation sparks uproar from crowd

FOX News

Selena Gomez is reportedly bringing her talents to award-winning director's new four-hour X-rated movie Minka Kelly uncorks a heater at 45, ABS backfires spectacularly and LSU parents vs a security guard! Robot's lifeless corpse hauled off stage after fall during disastrous Michael Jackson impression Bear cubs spar on woman's front porch in adorable viral nature video, reactions pour in Show Tiffany Stratton some respect -- a boob job doesn't mean the WWE champ is made of plastic Britney Spears stuns with a post-plea deal Instagram dance, college baseball HOT mic & is this dream normal? Landlord in a tenant's home for repairs was caught on a security camera getting it on with a woman instead Paige Spiranac continues her generational golf content influencing run in 2026, Mike Alstott is ripped & MEAT! 'World's sexiest fan' drops her World Cup anthem and here's why you never assist a bike thief Wearing only a watch, a headlamp and flip-flops isn't a great disguise when trashing a neighbor's motion light Stephen Miller: The American people rejected'third world' Democratic policies by voting for Trump Former CENTCOM commander'concerned' about Iran's residual military capabilities Wall Street titans sound alarm on Mamdani's'reckless' targeting of top employers Retired general says Iran is fighting a'war of resistance' Kevin Warsh's potential Fed chairmanship sparks economic debate on inflation Minnesota fraud mastermind sentenced to 41.5 years in prison President Tiffany Hernandez said the school was'using a new AI system as our reader' and called it'a lesson learned' Kurt Knutsson discusses growing public backlash against AI, including former Google CEO Eric Schmidt being booed at a University of Arizona commencement. He further discusses the development of artificial eggs that could revive dead species. I'll be honest with you guys, I don't know what to make of my feelings toward artificial intelligence, because my mood on the subject changes by the day.


Online Learning-to-Defer with Varying Experts

arXiv.org Machine Learning

Learning-to-Defer (L2D) methods route each query either to a predictive model or to external experts. While existing work studies this problem in batch settings, real-world deployments require handling streaming data, changing expert availability, and shifting expert distribution. We introduce the first online L2D algorithm for multiclass classification with bandit feedback and a dynamically varying pool of experts. Our method achieves regret guarantees of $O((n+n_e)T^{2/3})$ in general and $O((n+n_e)\sqrt{T})$ under a low-noise condition, where $T$ is the time horizon, $n$ is the number of labels, and $n_e$ is the number of distinct experts observed across rounds. The analysis builds on novel $\mathcal{H}$-consistency bounds for the online framework, combined with first-order methods for online convex optimization. Experiments on synthetic and real-world datasets demonstrate that our approach effectively extends standard Learning-to-Defer to settings with varying expert availability and reliability.


Screens would be banned until 2nd grade under draft LAUSD plan

Los Angeles Times

Things to Do in L.A. Tap to enable a layout that focuses on the article. Children and parents at a recent L.A. Unified school board meeting where screen-time limits were discussed. This is read by an automated voice. Please report any issues or inconsistencies here . The L.A. Board of Education got its first look at proposed screen-time limits for students, including a total ban until secnd grade.